Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval

نویسندگان

Royal Sequiera

Monojit Choudhury

Parth Gupta

Paolo Rosso

Shubham Kumar

Somnath Banerjee

Sudip Kumar Naskar

Sivaji Bandyopadhyay

Gokul Chittaranjan

Amitava Das

Kunal Chakma

چکیده

The Transliterated Search track has been organized for the third year in FIRE-2015. The track had three subtasks. Subtask I was on language labeling of words in code-mixed text fragments; it was conducted for 8 Indian languages: Bangla, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, Telugu, mixed with English. Subtask II was on ad-hoc retrieval of Hindi film lyrics, movie reviews and astrology documents, where both the queries and documents were either in Hindi written in Devanagari or in Roman transliterated form. Subtask III was on transliterated question answering where the documents as well as questions were in Bangla script or Roman transliterated Bangla. A total of 24 runs were submitted by 10 teams, of which 14 runs were for subtask I and 10 runs for subtask II. There were no participation for Subtask III. The overview presents a comprehensive report of the subtasks, datasets, runs submitted and performances.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mixed Script Ad hoc Retrieval using back transliteration and phrase matching through bigram indexing: Shared Task report by BIT, Mesra

This paper describes an approach for Mixed-script Ad hoc retrieval, a subtask as part of FIRE 2015 Shared Task on Mixed Script Information Retrieval. We participated in subtask 2 of the shared task, where a statistical model was used to carry out back transliteration to Devanagari script. To perform the search, bigram based index of the documents were used and search was performed using pivot t...

متن کامل

DA-IICT in FIRE 2015 Shared Task on Mixed Script Information Retrieval

This paper aims to describe the methodology followed by Team Watchdogs in their submission for the shared task on Mixed Script Information Retrieval (MSIR) in FIRE 2015. I participated in the subtask 1 (Query Word Labelling) and 2 (Mixed-script Ad hoc retrieval). For subtask 1, Machine Learning approach using CRF classifier was used to classify the tokens as one of the possible languages using ...

متن کامل

Overview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016

The shared task on Mixed Script Information Retrieval (MSIR) was organized for the fourth year in FIRE-2016. The track had two subtasks. Subtask-1 was on question classification where questions were in code mixed Bengali-English and Bengali was written in transliterated Roman script. Subtask-2 was on ad-hoc retrieval of Hindi film song lyrics, movie reviews and astrology documents, where both t...

متن کامل

Adaptive Voting in Multiple Classifier Systems for Word Level Language Identification

In social media communication, code switching has become quite a common phenomenon especially for multilingual speakers. Automatic language identification becomes both a necessary and challenging task in such an environment. In this work, we describe a CRF based system with voting approach for code-mixed query word labeling at word-level as part of our participation in the shared task on Mixed ...

متن کامل

AmritaCEN_NLP @ FIRE 2015 Language Identification for Indian Languages in Social Media Text

The progression of social media contents, similar like Twitter and Facebook messages and blog post, has created, many new opportunities for language technology. The user generated contents such as tweets and blogs in most of the languages are written using Roman script due to distinct social culture and technology. Some of them using own language script and mixed script. The primary challenges ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval

نویسندگان

چکیده

منابع مشابه

Mixed Script Ad hoc Retrieval using back transliteration and phrase matching through bigram indexing: Shared Task report by BIT, Mesra

DA-IICT in FIRE 2015 Shared Task on Mixed Script Information Retrieval

Overview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016

Adaptive Voting in Multiple Classifier Systems for Word Level Language Identification

AmritaCEN_NLP @ FIRE 2015 Language Identification for Indian Languages in Social Media Text

عنوان ژورنال:

اشتراک گذاری